Stochastic Inversion Transduction Grammars for Obtaining Word Phrases for Phrase-based Statistical Machine Translation
نویسندگان
چکیده
An important problem that is related to phrase-based statistical translation models is the obtaining of word phrases from an aligned bilingual training corpus. In this work, we propose obtaining word phrases by means of a Stochastic Inversion Translation Grammar. Experiments on the shared task proposed in this workshop with the Europarl corpus have been carried out and good results have been obtained.
منابع مشابه
Obtaining Word Phrases with Stochastic Inversion Transduction Grammars for Phrase-based Statistical Machine Translation
Phrase-based statistical translation systems are currently providing excellent results in real machine translation tasks. In phrase-based statistical translation systems, the basic translation units are word phrases. An important problem that is related to the estimation of phrase-based statistical models is the obtaining of word phrases from an aligned bilingual training corpus. In this work, ...
متن کاملJoint Phrase Alignment and Extraction for Statistical Machine Translation
The phrase table, a scored list of bilingual phrases, lies at the center of phrase-based machine translation systems. We present a method to directly learn this phrase table from a parallel corpus of sentences that are not aligned at the word level. The key contribution of this work is that while previous methods have generally only modeled phrases at one level of granularity, in the proposed m...
متن کاملLearning Stochastic Bracketing Inversion Transduction Grammars with a Cubic Time Biparsing Algorithm
We present a biparsing algorithm for Stochastic Bracketing Inversion Transduction Grammars that runs in O(bn3) time instead of O(n6). Transduction grammars learned via an EM estimation procedure based on this biparsing algorithm are evaluated directly on the translation task, by building a phrase-based statistical MT system on top of the alignments dictated by Viterbi parses under the induced b...
متن کاملAn Unsupervised Model for Joint Phrase Alignment and Extraction
We present an unsupervised model for joint phrase alignment and extraction using nonparametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a comp...
متن کاملWord Alignment with Stochastic Bracketing Linear Inversion Transduction Grammar
The class of Linear Inversion Transduction Grammars (LITGs) is introduced, and used to induce a word alignment over a parallel corpus. We show that alignment via Stochastic Bracketing LITGs is considerably faster than Stochastic Bracketing ITGs, while still yielding alignments superior to the widelyused heuristic of intersecting bidirectional IBM alignments. Performance is measured as the trans...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006